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Abstract 


A robust  computable  approximation  to  the  nonlinear  filtering 
problem  for  a diffusion  model  is  treated,  where  the  system  and  data 
models  are  given  by  dx  = f(x)dt  + a(x)dz,  dy  = g(x)dt  + dw.  The 
approximation  (with  approximation  parameter  h)  is  robust  in  the  sense 
that  it  is  locally  Lipschitz  continuous  in  the  data  y(-)  (sup  norm) 
uniformly  in  h and,  as  h -+  0,  it  converges  to  the  optimal  filter 
for  the  diffusion. 
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1.  Introduction-1- 

Let  z(-) ,w(-)  denote  two  independent  Wiener  processes, 
the  second  one  with  covariance  I*t  and  define  the  processes 
x(‘)»y(0  by  the  I to  equations 

(1.1)  dx  = f(x)dt  + o(x)dz,  x(0)  arbitrary  e Rr, 

(1.2)  dy  = g(x)dt  + dw,  y(0)  = 0,  y(t)  e RS,  t < T. 

where  T is  an  arbitrary,  but  fixed  positive  number. 

It  is  assumed  for  convenience  that  f ( • ) > a( • ) >g  ( • ) are 
bounded  and  continuous  and  the  solution  on  C [0,T]  (the  space  of 
Rr -valued  continuous  functions  on  [0,T])  of  (1.1)  is  unique  in 
the  sense  of  distributions.  It  is  also  assumed  that  the  function  g(*) 
has  bounded  and  continuous  first  and  second  derivatives  and 
either  that  the  range  of  x(-)  is  bounded  or  that  the  derivatives 
of  g(0  are  uniformly  continuous.  Concerning  the  first  condition, 
see  the  remarks  in  Section  4.  We  use  xt  and  x(t)  interchangeably. 

Let  F(-)  denote  a bounded  continuous  real  valued  function 
on  R , and  the  a-algebra  generated  by  yg,  s < t.  We  are 

concerned  with  robust  approximations  to  the  conditional  expectation 
EtF(xt)  = E[F(xt)|  ^;] , in  the  sense  that  we  want  a "good" 
approximation  which  is  a continuous  function  of  the  data  y ( * ) . 

Let  x(-)  denote  a process  which  is  independent  of  y(*), 

•j* 

but  which  induces  the  same  measure  on  C [0,Tj  that  x(*)  does. 

Then  it  is  well  known  [1],  [2]  that  w.p.l, 


The  author  gratefully  acknowledges  several  stimulating  discussions  with 
J.M.C.  Clark,  who  contributed  considerably  to  the  author's  understanding 
of  the  subject. 
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(1.3)  EtF(xt)  = 


E^  F(xt)exP[/Jg'(xu)dyu  - \ /Jj|g(xu)|2du] 


\ exP[/og'Cxu)dyu  * 7 f Slg(xu)|2dul 

E ~ denotes  expectation  over  x(-)  given  y(>)> 
at 

Although  stochastic  differential  equation  representations 
[1],  [2]  for  EtF(xt)  are  known,  the  problem  of  effectively 
computing  or  approximating  EtF(xt)  is  still  in  bad  shape.  Of 
particular  interst  is  a computational  or  approximation  method 
that  is  robust  in  y(-),  in  the  sense  that  it  is  suitably  continuous 
in  y(') , and  actually  approximates  E^FCx^)  well.  Here,  we 
develop  an  interesting  approach  to  this  problem  by  combining  the 
approximation  ideas  in  [3] , [4]  with  the  robustness  ideas  of 
Clark  in  [5].  In  particular,  Clark  showed  that  (1.4)  is  also  a 
version  of  EtF(xt)  and  that  it  is  locally  Lipschitz  continuous 


in  y(*)  at  each  y(-)  5 C1[0,T]  H C [ 0 , T ] . 

(1.4)  E 


, ETtF(xt)exp[y^g(xt)  - j[ jy^dg(xu)  - \ /Jj  | g (xu)  | 2du] 
tF(xtJ  = 

Erz  exp[ytg(xt)  • !oyudg(*u)  • i / oigCxu)i2(iui 


In  a formal  sense,  (1.4)  is  obtained  from  (1.3)  by  doing  an  inte- 
gration by  parts  on  the  stochastic  integral  in  (1.3). 

In  [3] , Kushner  developed  a computational  approach  to  optimal 
control  and  filtering  problems  on  diffusion  models.  The  basic  idea 
was  to  approximate  the  diffusion  in  a particular  way  by  an 
interpolated  discrete  parameter  Markov  chain,  and  to  show  that  the 
minimal  cost  for  the  controlled  chain  converged  to  that  for  the 
diffusion,  as  some  approximation  parameter  went  to  zero.  For  the 
filtering  problem  the  filter  for  the  approximating 


r 


U 


chain  (but  using  the  actual  observational  data  from  (1.2))  similarly 
converged  to  the  filter  for  the  diffusion.  The  interpolation  times 
were  constant  in  the  filtering  problem  in  [3] . We  can  use  the  same 
approximating  process  here  as  vas  used  in  [3] . In  fact  the  filter 
approximation  of  [3]  can  be  proved  to  be  robust  in  the  sense  of 


Theorem  1 also.  But  it  is  more  convenient  notationally  to  use  the 

h, 


continuous  parameter  Markov  chain  approximation  x (•)  to  x(-) 
which  was  developed  by  Kushner  and  DiMasi  ([4],  Section  8)  (h  is 


an  approximation  parameter,  see  below). 

h 


Next,  let  x (•)  denote  any  finite  state  continuous  parameter 
Markov  chain,  and  let  us  consider  the  corresponding  filtering 
problem.  Let  the  observational  data  available  at  t still  be  de- 
noted by  yu,  u < t,  where 


(1.5) 


h 


dy  = g(xt)dt  + dw. 


Then  it  is  well  known  ([6],  [7])  that  (1.3)  holds  w.p.l  with 


i t I 

xn(.),  xn(.)  replacing  x(«),  x(.)  resp.,  where  xn(*)  has  the 


same  distributions  as  x (•)  has,  but  is  independent  of  y(«); 
i . e. , 

r— Ik r ft  i frh)  j..  1 1 t 1 _ hi  i 2. 


h,  V[It)e<plV'lI»)d>'u  • I /ole(*u)l  dul 

(1.6)  EtF(xt)  = 

exp  [ / Qg  ' (x^)  dyu  - \ /5|g(^)|2du] 


% 


-h, 


Due  to  the  piecewise  constant  nature  of  x (•)»  the  stochastic 


v 


r 


integral  can  be  readily  integrated  by  parts  to  yield  the  chain 
version  of  (1.4): 


i 


(1.7)  B f(l{)  - E'ffF(xt)explg'(xt)yt  - /Syude(xuh)  - ? j;ig0=u)|2-tul 

E%  exp  [g  ' (xE)yt  - /fodgOfy  - \ /J  | g (xjj)  | 2du] 

The  expression  (1.7)  is  also  ([5]) locally  Lipschitz  continuous  in 
y( • ) each  y( • ) e Cs[0,T].  In  Section  2,  we  define  a particular 

chain  for  which  (1.7)  can  be  computed  and  and  also  converges 
to  (1.4)  for  all  y(-)-  In  Section  3,  \ve  prove  the  uniform 
robustness  result;  namely  that  the  continuity  in  y(-)  of  (1.7) 
is  uniform  in  h.  This  uniformity  is  crucial  for  robustness. 

Some  remarks  on  computation  are  made  in  Section  4.  Let  F^(y(*)) 
and  Ft(y(*))  denote  the  value  of  (1.7)  and  (1.4),  resp . , at 

y(0  • 
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2.  A Useful  Approximating  Chain  {xn(Q}. 

We  next  describe  the  particular  approximating  chain  which  is 
to  be  used  in  Section  3.  Other  approximating  chains  can  certainly 
be  used.  The  main  criteria  for  the  choice  concern  the  robustness 
and  the  ease  of  use  in  computation  for  the  filtering  problem.  Let 
denote  the  finite  difference  grid  on  Rr  with  difference 
parameter  h.  Either  R^  or  a subset  of  it  will  be  our  state 
space.  Actually,  h can  be  vector  valued,  the  finite  difference 
interval  depending  on  the  direction,  but  we  stick  to  the  simpler 
case.  The  transition  function  will  be  stated  for  the  case  where 
a(-)  = °(’)a'(*)/2  is  diagonal,  for  notational  simplicity.  The 
expressions  for  the  general  case  are  in  [3,  Chapter  6.2].  Define 
Qh(x)  =2  l aii(x)  + h Z|f.(x)|,  Ath(x)  = h2/Qh(x),  let 

inf(|a(x)|  + |f(x)|)  > 0 and  e.  = unit  vector  in  it}l  coordinate 

x r h 

direction.  For  x e R^  and  y = x ± e^h,  set  pn(*,y)  = 

[a^Cx)  + hf^(x)  ] /Qh(x)  where  f+  = max[0,f],  f = max[0,-f]; 

for  other  (x,y)-pairs,  set  pn(x,y)  = 0.  Let  {5n>  denote  the 

chain  with  transition  probabil ities  (p^(x,y) } • Interpolate  the  chain 

into  a continuous  parameter  Markov  process,  denoted  by  x^(-),  by 

defining  the  interjump  intervals  by 

L 

P{ jump  after  t + s|x"  = x]  = exp  - (s/At  (x) ) . 


! 


The  process  has  the  following  properties  [4,  Section  8]. 


Efnext  state  value  - x|x  = current  state  value]  * f(x)At^(x)  , 
covar[next  state  value  - x|x  = current  state  value]  = 2a(x) At*1  (x)  + o(At\x)) 


L 
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E[xt+A  ' XlXt  = x]  = £<X)A  + °(A)> 
covar[xt+A  - x|x£  = x]  = 2a  (x)  A + o(A). 

More  details  are  in  [3],  [4].  The  sequence  { x^1  C * ) } converges 
weakly  in  Dr[0,T)  to  the  x(-)  of  (1.1),  where  Dr[0,T]  is 
the  space  of  Rr  valued  functions  on  [ 0 , T ] which  have  left-hand 
limits  and  are  right  continuous.  The  space  is  endowed  with  the 

L L 

Skorokhod  topology.  The  states  of  the  process  {xn(*)},{£™)  only 
communicate  with  their  nearest  neighbors. 

Next,  the  main  robustness  theorem  will  be  proved. 

I 
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The  Robustness  Theorem. 


Theorem  1.  EtF(xt)  given  by  (1.7)  is  continuous  in  the 
supremum  norm  at  each  y(*)  e Cs[0,T],  uniformly  in  (small)  h > 0. 
In  fact,  for  each  bounded  set  S _in  Cs[0,T],  there  is  a real 


K(S)  suc,h..t,h,at  | F^(y  ( • ) ) - F^(y(-))l  5 K(s)lly  ' /I  I for  y(-) 


and  y(0  e S.  Also , (1.7)  converges  to  (1.4)  for 


each  y ( • ) e C [0,T] . 


Proof.  Let  S denote  a bounded  set  in  CS[0,T1,  with  y(') 
and  y( • ) e S.  We  need  only  show  that  there  is  a constant  K^(S) 
depending  only  on  S such  that 


exply^gCxJ)  - | yMg(xJj)  - \ |g(x[[)|2du] 


(3.1) 


ft  U ^ ft 


- exp[y£g(xj)  - J^dgfxjj)  - j.  j '\  g (xJj)  | 2du  1 i K^S)  | |y-y|  | 


uniformly  in  h and  t < T . We  can  and  will 


1 ft  h 2 

drop  the  ~ 2 J lg(xu)l  ou  term>  since  g(-)  is  bounded.  Then, 


using  the  inequality  |ex-ey|  <_  | x-y  | (eX+ey)  , we  have  the  upper 
bound  for  (3.1) 


E|  (yt-yt) 'g(xj)  - | (yu-yu)  'dg(xjj)  | | expty^g(xj)  - 
+ exp[y|g(xj)  - J y^dg(xjj)]|. 


ft 


y’dg(xu)J 


We  need  only  show  (3.2)  and  (3.3) 


is  an  arbitrary  constant, 


I a 


1 ^ h 2 2 

(3.2)  E | ^ y-dgUu)  | < K2  II  y | | , uniformly  in  t < T and  small  h. 


(3.3)  E exp  q.'.dg (x  ,)  bounded  uniformly  in  bounded  sets  of 

j 0 u u 

q(-)  e C s [ 0 , T]  and  in  (small)  h and  t < T. 


First,  (3.2)  will  be  proved.  Let  and  &(T  ) denote 

the  minimal  o-algebra  over  which  g(x”),  u < t , is  measurable  and 
the  Borel  field  over  [0,T],  resp.  It  is  convenient  to  use  the 
decomposition 


(3.4) 


( h'l  ,,h  ^ _h 
g(xt)  = Mt  + rt, 


h h 

where  M (•)  is  a martingale  and  r (•)  is  a predictable  process; 
in  particular,  T (t)  is  adapted  to  and  (as  an  (w,t)  function), 

fh(*)  is  measurable  on  the  sub  o-algebra  of  (8^  * dB  (T  ) which 
is  induced  by  the  left  continuous  functions.  The  decomposition  (3.4) 


is  unique  and  T (•)  has  the  representation  T 
yh  _ yh^lu  an^  y^(*)  is  given  by 


h _ fZ 

t “ Jn 


Y^ds  where 


(3.5) 


Yh(x)  = l Ig(y)  - g(x)]ph(x,y)/Ath(x) 


Note  that  r (•)  is  the  unique  predictable  function  which  satisfies 


E[r£+S  - = E[g(x^+s)  - g(xJ)|^J], 
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from  which  T^(.)  and  Y^(*)  can  be  determined.  Using  a truncated 


Taylor  expansion  on  (5.5),  and  the  properties  of  the  chain  which  were 


given  in  Section  2,  and  the  definition  g (x)  = Hessian  of  g(>) 

XX 


at  x,  we  can  write 


(3.6)  Y (x)  = ((g^(x)f(x)  + trace  a(x)gxx(x)]Ath(x)  + o(Ath(x) }/Ath(x) 


= ^g(x)  + 


where  -*■  0 as  h -*■  0 uniformly  in  t < T,  and  in  x,  and 


is  the  differential  generator  of  (1.1).  In  any  case  | Yft (x) | is 


uniformly  bounded  in  (small)  h and  in  x.  Note  that  Mh ( • ) is  bounded 


on  [ 0 , T] , since  Yn(-)  and  g(xn(-))  are. 


Next,  let  us  calculate  the  quadratic  variation  of  M (•)• 

This  is  the  increasing  (in  the  sense  of  positive  definite  matrices) 


matrix  valued  predictable  function  A (•)  such  that 


<>?>’  - 5 


is  a matrix  valued  martingale.  We  have  A?  = 


= ^(xo)  and  X^(*)  is  given  by 


X;ds,  where 

Y 5 


* (x)  = l [g (y)  - g(x) [ [g (y) 

y 


g(x)J  ’ph(x,y)/Ath(x) 


h Jl 

Xn(*)  is  obtained  from  the  characterization  of  A (•)  as  being 


10 


the  unique  predictable  function  such  that,  for  all  6 > 0, 


E[a';+6-  A^llgJj  - 


Note  that  |Xft(x)|  is  bounded  uniformly  in  x and  h. 
Now,  we  are  prepared  to  evaluate  (3.2).  Write 


ft 


y’dg(xu)  = 


■ t . rt 

y'dMn  + y'yndu. 
n u u J n^u  ru 


Then  (3.2)  follows  from  the  uniform  boundedness  of  Y (’)  and 

X^(*)  and  the  martingale  inequality 

t XX 

E max | f y'dMh|2  <4||y||2E  [ dA^=4||y||2E  [ xjjdu 
t<T  J 0 u u J n u J 0 u 


We  now  turn  to  (3.3).  It  is  convenient  to  bound  (3.3)  under 
the  assumption  that  xh(-)  is  stopped  after  the  Nth  jump,  where  N 
is  an  arbitrary  integer.  The  obtained  bound  will  not  depend  on  N. 
Fix  t,  set  t = n<5,  where  n is  an  integer  and  write  (3.3)  as 


v.  n_  1 

An  = E [J  exp 
i = 0 


q'dg(x[j). 
[16, 16+6)  u u 


Let  E*},  denote  the  expectation  conditioned  on  flhr.  Let 

1 ° io 

x = xi6’  There  is  a function  o(*)  which  can  depend  on  N and  h 
and  on  the  modulus  of  continuity  of  q(*)  on  [0,T],  but  is  uniform 
in  x and  is  such  that 


(3.7) 


Ai  s Ei6  exP  j ^uds(xS)  5 a - Hr~  + 

[ i6  , i 6+  6)  At  (x) 


+ EE6  exp  qj6[g(x+)  - g(x)]  • — ^ + o(6), 

At  (x) 


where  x is  the  successor  state  to  x,  given  one  jump  in 

[i6,i6+6)  and  q\ ^ is  the  value  of  q^  at  the  jump  time  in  [ i 6 , i 6+  6)  if  any. 
Expanding  (3.7)  yields 

A\  < (1 - 6/A th(x) ) + ( 6/Ath(x)  + o(6))Ej6[l  + q!6(g(x  + ) - g(x)) 

+ (q}6(g(x+)  - g (x) ) ) 2/ 2 
+ - g(x)))2]  + o(6) 

where  o^(y)/y  0 as  y -*■  0 . Thus  for  some  K^,  independent  of  N,h 
and  x,  and  an  o(*)  with  the  properties  of  the  above  o(-)  function 


i 


Aj  < 1 + K16  + o(6)  . 


Substituting  this  into  (3.7)  and  letting  6 -*■  0 yields 


< exp  K^T,  independently  of  h and  N 


The  last  assertion  of  the  theorem  can  be  proved  in  the  same 

way  that  a similar  assertion  was  proved  in  [3,  Chapter  7.5]  for  an 

1- 

interpolation  of  a chain  similar  to  . Q.E.D. 


1 


I 
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Remarks . 1.  Since  (1.7)  also  converges  to  (1.3)  for  almost 

all  y(*)  (Wiener  measure)  the  theorem  provides  another  proof  of 
Clarks  representation  (1.4)  and  its  Lipschitz  continuity. 

2.  The  uniformity  (equi  Lipschitz  continuity)  in 
the  theorem  is  crucial  to  the  value  of  the  result,  for  otherwise 
the  "robustness"  could  well  become  less  and  less  as  h -*•  0 . 


3.  From  an  applications  point  of  view,  robustness 
is  important  since  the  conditional  moments  should  be  smooth  functions 
of  the  data.  Otherwise  unaccounted  for  errors  in  measurement  or  errors 
in  the  numerical  calculation  might  render  the  result  meaningless. 
Furthermore,  in  applications  w(*)  is  not  usually  a Wiener  process, 
although  it  is  convenient  to  use  a filter  designed  under  the 
assumption  that  it  i_s  a Wiener  process.  Then  the  robustness  idea 
is  that  if  w(*)  is  close  to  a Wiener  process  (in  some  pathwise  sense), 
then  the  estimates  would  also  be  close  to  the  estimates  which  would  be 
obtained  if  w( • ) were  actually  a Wiener  process.  We  might  lose 
information  by  not  building  a filter  which  considers  the  actual 
statistical  structure  of  (non  Wiener)  w(*),  but  that  filter  would  be 
much  more  complicated  than  the  one  which  is  optimal  under  the  Wiener 
assumption. 
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4.  Remarks  on  Computation 


First,  let  A > 0 and  let  us  define  a process  to  be  denoted 
by  C^,A(*)  and  which  is  essentially  but  altered  so  that  the 

jumps  occur  only  at  times  iA,  i = 1,....  Let  A < Atn(x) , all  x. 

In  particular,  set  Ch,A(t)  = 5*?,A  on  [ iA , iA  + A)  , where  {^’A} 
is  a Markov  chain  on  the  state  space  with  transition 

probabilities  P^y)  = P(x  y)  (A/Ath(x) ) for  y = x ± e^  and 

Pf’Ax,  = 1 - l Pfx^vl  ' The  Process  Sh’V)  converges  weakly 

^ 9 j y=x+ eh  ^ y J 
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to  x(-)  as  h,  A -»■  0. 

Let  5^,A(0  be  independent  of  y(-)  but  have  the  same  path 
distributions  as  C^’ACO  does.  Then  with  £^’A(*)  replacing  x ^ ( • ) 
in  (1.6),  it  is  shown  in  [3,  Chapter  7.5]  that  (1.6)  converges  to 
(1.3)  for  almost  all  y ( • ) (Wiener  measures). 

The  method  used  in  the  proof  of  the  theorem  can  also  be  used 
with  the  approximation  replacing  the  approximation  x^(-). 

Then  both  T^(»)  and  A^(*)  will  still  be  predictable,  but  will  be 
piecewise  constant  in  the  [iA,iA+A)  intervals,  and  the  theorem  will 
continue  to  hold.  Furthermore  the  difference  between  (1.6)  evaluated 
with  xh(  • ) , and  with  5^,A(*)»  goes  to  zero  as  A,  h 0,  uniformly 
in  bounded  y(’)  sets. 

In  order  to  have  a computationally  feasible  method  for  getting 
values  of  (1.6),  the  state  space  must  be  finite.  Let  G be  a closed  hyper- 
rectangle in  Rr  and  set  = R^  A G.  Let  T = inf{t:  xt  e 9G) , 

t'  = inf(t:  xt  i-  G},  and  let  p denote  the  measure  of  x^.  Suppose 
that  9G,  the  boundary  of  G,  is  regular  in  the  sense  that 
Pu(t  <->  T = T'  n T)  = 1.  Let  xh  ( • ) ,x(  • ) , ?h’A  ( • ) denote  the 
processes  stopped  on  first  exit  from  G.  Then  the  theorem  remains 
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valid  for  these  processes  replacing  x^( • ) ,x( • ) A ( * ) (then 
x(-)  and  are  also  replaced  by  the  stopped  processes). 

With  the  use  of  the  stopped  process,  the  state  space  is  finite 

and  the  method  of  [3,  Chapter  7.5]  can  be  used-with  the 
approximation.  Alternatively,  we  can  use  a method  of  Clark.  In  [5], 
Clark  gave  a set  of  ordinary  differential  equations  (not  Ito  equations) 
for  realizing  (1.7),  and  the  solution  of  this  set,  when  considered  as 
a function  of  y ( * ) , has  the  same  robustness  property  as  (1.7)  has. 
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